OpenAI Gym 인터페이스를 따르는 Custom Environment 구현

Study

OpenAI Gym 인터페이스를 따르는 Custom Environment 구현

SigmoidFunction 2025. 11. 29. 11:04

728x90

문제 개요

상황: 회사에서 자체 개발한 로봇 시뮬레이터가 있다고 가정합니다. 이를 강화학습 에이전트가 학습할 수 있도록 OpenAI Gym(gym.Env) 인터페이스로 래핑(Wrapping)해야 합니다.
목표: 5x5 Grid World 환경을 Class로 구현하시오.
필수 구현 메서드:
1. __init__: Action Space(이산), Observation Space(Box 또는 Discrete) 정의.
2. reset: 에이전트를 시작 위치로 초기화하고 첫 상태 반환.
3. step(action): 행동을 받아 다음 상태, 보상, 종료 여부, 정보 반환.
4. render: 현재 상태를 텍스트로 출력.
환경 규칙:
- 맵: 5x5 (0: 빈곳, 1: 장애물, 2: 목표)
- 보상: 목표 도달(+10), 장애물 충돌(-5), 일반 이동(-0.1, 최단 경로 유도).
- 종료: 목표 도달 또는 장애물 충돌 시.

import gym
from gym import spaces
import numpy as np

class SimpleGridEnv(gym.Env):
    metadata = {'render.modes': ['human']}

    def __init__(self):
        super(SimpleGridEnv, self).__init__()
        # 맵 크기 정의
        self.grid_size = 5
        self.max_steps = 20
        self.current_step = 0

        # Action Space: 상, 하, 좌, 우 (4개)
        self.action_space = spaces.Discrete(4)
        
        # Observation Space: 에이전트의 (x, y) 좌표. 정규화하여 [0, 1] 사이 값으로 전달한다고 가정
        # shape=(2,) : [row, col]
        self.observation_space = spaces.Box(low=0, high=self.grid_size-1, shape=(2,), dtype=np.float32)

        # 맵 설정 (S: Start, G: Goal, X: Obstacle)
        # (0,0) Start, (4,4) Goal
        self.agent_pos = [0, 0]
        self.goal_pos = [4, 4]
        self.obstacles = [[1, 1], [2, 2], [3, 1]] # 장애물 좌표 예시

    def reset(self):
        self.agent_pos = [0, 0]
        self.current_step = 0
        return np.array(self.agent_pos, dtype=np.float32)

    def step(self, action):
        self.current_step += 1
        
        # 이동 로직 (상, 하, 좌, 우)
        row, col = self.agent_pos
        if action == 0:   # Up
            row = max(0, row - 1)
        elif action == 1: # Down
            row = min(self.grid_size - 1, row + 1)
        elif action == 2: # Left
            col = max(0, col - 1)
        elif action == 3: # Right
            col = min(self.grid_size - 1, col + 1)
        
        new_pos = [row, col]
        
        # 보상 및 종료 조건 처리
        reward = -0.1 # Time step penalty
        done = False
        
        # 1. 장애물 충돌
        if new_pos in self.obstacles:
            reward = -5.0
            done = True
        
        # 2. 목표 도달
        elif new_pos == self.goal_pos:
            reward = 10.0
            done = True
            
        # 3. 최대 스텝 초과
        elif self.current_step >= self.max_steps:
            done = True
            
        self.agent_pos = new_pos
        
        # info는 디버깅 정보 등을 담음
        info = {}
        
        return np.array(self.agent_pos, dtype=np.float32), reward, done, info

    def render(self, mode='human'):
        for r in range(self.grid_size):
            line = ""
            for c in range(self.grid_size):
                if [r, c] == self.agent_pos:
                    line += "A "
                elif [r, c] == self.goal_pos:
                    line += "G "
                elif [r, c] in self.obstacles:
                    line += "X "
                else:
                    line += ". "
            print(line)
        print("\n")

# 테스트 코드 (채점용)
if __name__ == "__main__":
    env = SimpleGridEnv()
    obs = env.reset()
    env.render()

    print("--- Random Agent Action ---")
    for _ in range(5):
        action = env.action_space.sample()
        obs, reward, done, info = env.step(action)
        print(f"Action: {action}, State: {obs}, Reward: {reward}, Done: {done}")
        env.render()
        if done: break

728x90

저작자표시 (새창열림)

'Study' 카테고리의 다른 글

[Discrete/PPO] Dynamic Grid World: 움직이는 장애물을 피해 목표로 가는 로봇 (동적 환경 계획) (0)	2025.11.29
Behavior Cloning (모방 학습) (0)	2025.11.29
2D Grid Map에서의 A* 최단 경로 탐색 (0)	2025.11.29
CartPole-v1 환경에서의 강화학습 에이전트 구현 (0)	2025.11.29
자료구조 B-tree 기본 개념 파악 (2) (0)	2021.12.27

현재글OpenAI Gym 인터페이스를 따르는 Custom Environment 구현

시그모이드

Github : https://github.com/DrunkJin

250x250

MVC, 딥러닝, 머신러닝, Python, 파이썬, riot, dataframe, 데이터프레임, 롤, 씨쁠쁠, 자료구조, 코테, RiotAPI, CPP, 백준, 프로그래머스, 코딩, C++, 코딩테스트, 라이엇,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28

시그모이드

OpenAI Gym 인터페이스를 따르는 Custom Environment 구현

'Study' 카테고리의 다른 글

'Study'의 다른글

티스토리툴바

OpenAI Gym 인터페이스를 따르는 Custom Environment 구현

'Study' 카테고리의 다른 글

'Study'의 다른글

관련글

티스토리툴바