Generate Randomness for API Models
It’s helpful to generate random data in the shape of the vendor models when integrating with third parties.
Here’s the first step in building a service that generate random data in the shape of specific model.
Model Generation
There’s a tool that can take an OpenAPI spec and generate a Pydantic model. It’s called datamodel-code-generator and you can read about it here. A URL that points to an API spec can be passed and the output is a Python module containing the Pydantic models.
The Pydantic models are output. Many options available to control behavior, here’s a simple example from this spec
(venv) morgan@LAPTOP-O1G4SPR0:~/d/lab/python/random_api$ datamodel-codegen --url "https://raw.githubusercontent.com/OAI/OpenAPI-Specification/main/examples/v3.1/webhook-example.yaml"
The input file type was determined to be: openapi
This can be specificied explicitly with the `--input-file-type` option.
# generated by datamodel-codegen:
# filename: https://raw.githubusercontent.com/OAI/OpenAPI-Specification/main/examples/v3.1/webhook-example.yaml
# timestamp: 2023-12-01T22:17:43+00:00
from __future__ import annotations
from typing import Optional
from pydantic import BaseModel
class Pet(BaseModel):
id: int
name: str
tag: Optional[str] = None
Randomess
Polyfactoy (used to be Pydantic factories) can create instances of Pydantic models, assigning random data to the fields. It’s fairly configurable too. Here’s an (messy) example usind the model above
In [12]: class Pet(BaseModel):
...: id: int
...: name: str
...: tag: Optional[str] = None
In [13]: from polyfactory.factories.pydantic_factory import ModelFactory
In [14]: class PetFactory(ModelFactory):
...: __model__ = Pet
...:
In [15]: pet = PetFactory.build()
In [16]: print(pet)
id=7612 name='yqdjJbkFbaRUjuGSrQmD' tag='qdjsdoEIOrmVeNaffMHy'
Service
Putting those two things together gives you a service that can generate random data in the shape of a model provided by an OpenAPI spec. It works, but it’s a work in progress. Here’s the code.
Some Code Highlights
The model can be persisted for future use.
def create_model_from_url(url, persistor: BasePersistor) -> str:
spec_id = uuid4().hex
output_path = Path(TEMP_MODULE_PATH.name) / f"{spec_id}.py"
main(["--url", url, "--input-file-type", "openapi", "--output", str(output_path), "--output-model-type", "pydantic_v2.BaseModel", "--use-annotated"])
if not output_path.exists():
raise RegistrationFailed(spec_id)
model_str = output_path.read_text()
fixed_model_str = patterned_field_fixup(model_str)
if fixed_model_str != model_str:
output_path.write_text(fixed_model_str)
persistor.write(spec_id, fixed_model_str)
return spec_id
The model is written to temporary file, and then imported (abusing the import system?) using a custom finder. Rather quick when hosted on GCP App Engine, as the module is stored in local in memory files system.
def _add_finder(path: str) -> None:
class CustomFinder(machinery.PathFinder):
_path = [path]
@classmethod
def find_spec(cls, fullname, path=None, target=None):
return super().find_spec(fullname, cls._path, target)
sys.meta_path.append(CustomFinder)
You can specify which modules you want from those available in generated module. This filters for the selected modules, if any.
members = (getmembers(module, inspect.isclass))
if models:
target_models = [x.lower() for x in models]
clzs = [x[1] for x in members if x[1].__module__ == spec_id and x[0].lower() in target_models]
else:
clzs = [x[1] for x in members if x[1].__module__ == spec_id]
A new generic class inherits from Polyfactory’s ModelFactory ensuring that all child factories create random data for optional fields. That’s because all ModelFactories are dynamically created, this simplifies that code.
class MyModelFactory(Generic[T], ModelFactory[T]):
__is_base_factory__ = True
__allow_none_optionals__ = False
def __init_subclass__(cls, *args: Any, **kwargs: Any) -> None:
super().__init_subclass__(*args, **kwargs)
Todo
- It’s the first part of a serivce that can be configured to generate random data of a specific at specific or random intervals and quantities, super useful for testing integrations.
- Lots of improvements needed here.