TL;DR: An engine is just a collection of reusable components.
Wall of text:
Games store a representation of the world, e.g. you’d have an entity like a house, which has a mesh that defines what its geometry looks like and a position.
Now, you probably have more than one entity. Let’s say we’ve got a player controlled character and some boxes. When the player wants to move, you’d move the character and check if he collides with any boxes. That’s where physics come into play. Depending on how physics are modelled, the character might just walk through the boxes, stop in front of the boxes or push the boxes.
Mind you, this only changes the internal state of the game world.
Then, after calculating movement and physics, you’d render a frame, e.g. draw the current state of the world on the screen.
OpenGL and DirectX are graphics APIs, they are only used to render stuff (e.g. the character) to the screen.
So, you’re basically telling OpenGL/DirectX ‘I want to draw the character at position1 and a box at position2, and …’.
Of course, this is a gross oversimplification.
Anyway, since we don’t want to walk through that procedure step by step all the time, we put that stuff into functions and classes. So, we might have a class physics which holds all the functions needed for the physics simulation, or a graphics class which interfaces with the graphics api and draws our entities. Et voila, we’ve got an engine.