This working paper is the result of my ECN Berlin lab rotation at Henning Sprekeler’s lab at TU Berlin. We use tools from Multi-Agent Reinforcement Learning to learn collective behavior in a gradient-driven fashion. More specifically, we introduce a multi-agent environment with a reward function design based on phenomenological observations of fish schools. These include survival, alignment, attraction and repulsion. Checkout the Swarm MARL environment in OpenAI Gym style.