It’s helpful to generate random data in the shape of the vendor models when integrating with third parties.

Here’s the first step in building a service that generate random data in the shape of specific model.

Model Generation

There’s a tool that can take an OpenAPI spec and generate a Pydantic model. It’s called datamodel-code-generator and you can read about it here. A URL that points to an API spec can be passed and the output is a Python module containing the Pydantic models.

The Pydantic models are output. Many options available to control behavior, here’s a simple example from this spec

(venv) morgan@LAPTOP-O1G4SPR0:~/d/lab/python/random_api$ datamodel-codegen --url "https://raw.githubusercontent.com/OAI/OpenAPI-Specification/main/examples/v3.1/webhook-example.yaml"
The input file type was determined to be: openapi
This can be specificied explicitly with the `--input-file-type` option.
# generated by datamodel-codegen:
#   filename:  https://raw.githubusercontent.com/OAI/OpenAPI-Specification/main/examples/v3.1/webhook-example.yaml
#   timestamp: 2023-12-01T22:17:43+00:00

from __future__ import annotations

from typing import Optional

from pydantic import BaseModel


class Pet(BaseModel):
    id: int
    name: str
    tag: Optional[str] = None

Randomess

Polyfactoy (used to be Pydantic factories) can create instances of Pydantic models, assigning random data to the fields. It’s fairly configurable too. Here’s an (messy) example usind the model above

In [12]: class Pet(BaseModel):
    ...:     id: int
    ...:     name: str
    ...:     tag: Optional[str] = None

In [13]: from polyfactory.factories.pydantic_factory import ModelFactory

In [14]: class PetFactory(ModelFactory):
    ...:     __model__ = Pet
    ...:

In [15]: pet = PetFactory.build()

In [16]: print(pet)
id=7612 name='yqdjJbkFbaRUjuGSrQmD' tag='qdjsdoEIOrmVeNaffMHy'

Service

Putting those two things together gives you a service that can generate random data in the shape of a model provided by an OpenAPI spec. It works, but it’s a work in progress. Here’s the code.

Some Code Highlights

The model can be persisted for future use.

def create_model_from_url(url, persistor: BasePersistor) -> str:
    spec_id = uuid4().hex
    output_path = Path(TEMP_MODULE_PATH.name) / f"{spec_id}.py"
    
    main(["--url", url, "--input-file-type", "openapi", "--output", str(output_path), "--output-model-type", "pydantic_v2.BaseModel", "--use-annotated"])
    if not output_path.exists():
        raise RegistrationFailed(spec_id)
    model_str = output_path.read_text()


    fixed_model_str = patterned_field_fixup(model_str)
    if fixed_model_str != model_str:
        output_path.write_text(fixed_model_str)
    
    persistor.write(spec_id, fixed_model_str)
    return spec_id

The model is written to temporary file, and then imported (abusing the import system?) using a custom finder. Rather quick when hosted on GCP App Engine, as the module is stored in local in memory files system.

def _add_finder(path: str) -> None:
    class CustomFinder(machinery.PathFinder):
        _path = [path]


        @classmethod
        def find_spec(cls, fullname, path=None, target=None):
            return super().find_spec(fullname, cls._path, target)


    sys.meta_path.append(CustomFinder)

You can specify which modules you want from those available in generated module. This filters for the selected modules, if any.

    members = (getmembers(module, inspect.isclass))
    if models:
        target_models = [x.lower() for x in models]
        clzs = [x[1] for x in members if x[1].__module__ == spec_id and x[0].lower() in target_models]
    else:
        clzs = [x[1] for x in members if x[1].__module__ == spec_id]

A new generic class inherits from Polyfactory’s ModelFactory ensuring that all child factories create random data for optional fields. That’s because all ModelFactories are dynamically created, this simplifies that code.

    class MyModelFactory(Generic[T], ModelFactory[T]):
        __is_base_factory__ = True
        __allow_none_optionals__ = False
        def __init_subclass__(cls, *args: Any, **kwargs: Any) -> None:
            super().__init_subclass__(*args, **kwargs)

Todo

  • It’s the first part of a serivce that can be configured to generate random data of a specific at specific or random intervals and quantities, super useful for testing integrations.
  • Lots of improvements needed here.