[Python] DataFrame에서 null이 들어있는 행만 보고 삭제하기

Study/Python

[Python] DataFrame에서 null이 들어있는 행만 보고 삭제하기

SigmoidFunction 2023. 2. 7. 11:22

728x90

데이터를 다루다보면 null값이 들어있는 데이터가 상당히 많습니다.

이 값들을 어떻게 처리할 지에 대해서 항상 고민이 되는데요

이 포스팅에선 Row에 들어있는 null값들만 확인하고 그 행들만 삭제하는 방법을 알아보겠습니다.

import pandas as pd
import numpy as np

dogs = np.random.choice(['labradoodle','beagle','mutt','Golden Retrievers',
					'Greyhound','French Bulldog','Shih Tzu',None], size=50_000)
smell = np.random.randint(0, 100, size = 50_000)
location = np.random.choice(['Korea','China','United States', 'Japan', 'France',
					'United Kingdom','Taiwan', None], size = 50_000)

df = pd.DataFrame(data = np.array([dogs, location,smell]).T,
                  columns = ['dogs','location','smell'])

먼저 가짜 데이터를 만들어줍니다. 50,000개의 데이터프레임이 있다고 가정하겠습니다. dogs에는 None을 통해 결측치를 집어넣어 주었습니다.

None으로 적혀있는 것들이 결측치입니다.

저의 경우는 dogs에서 6,237개의 결측치와 location에선 6,206개의 결측치가 존재합니다.

결측치가 있는 데이터들만 보겠습니다.

df[df['dogs'].isnull()]

dogs 컬럼이 null인 데이터들을 출력합니다.

나머지 컬럼은 영향을 받지 않습니다.

location의 경우도 동일합니다.

그러나 dogs와 location을 한번에 넣으면 작동하지 않습니다.

그럼 어떻게 이 결측치값들이 있는 행만 제거할까요?

방법은 index를 이용하는 겁니다.

dogs_null_index = df[df['dogs'].isnull()].index

dogs 데이터가 결측치인 행의 인덱스번호를 가져오고

 df.drop(dogs_null_index, axis=0)
 
 # 덮어쓰기
 #  df.drop(dogs_null_index, axis=0, inplace=True)

dogs에 있는 null값이 제거되었습니다.

동일하게 location도 진행해보면

이걸 함수로 한번에 만들어보겠습니다.

먼저 더미데이터를 만들고

dogs = np.random.choice(['labradoodle','beagle','mutt','Golden Retrievers','Greyhound','French Bulldog','Shih Tzu',None], size=50_000)
smell = np.random.randint(0, 100, size = 50_000)
location = np.random.choice(['Korea','China','United States', 'Japan', 'France','United Kingdom','Taiwan', None], size = 50_000)
blah1 = np.random.choice(['AA','NN','BB', 'VV','CC','RR', None], size = 50_000)
blah2 = np.random.choice(['aa','ss','ww', 'qq','ww','dd', None], size = 50_000)
df = pd.DataFrame(data = np.array([dogs, location,smell, blah1, blah2]).T,
                  columns = ['dogs','location','smell','blah1','blah2'])

def remove_row_null_data(df):
    for col_name in df.columns:
        null_index = df[df[col_name].isnull()].index
        df.drop(null_index, axis=0, inplace=True)
    return df

간단하게 처리되었습니다.

728x90

저작자표시

'Study > Python' 카테고리의 다른 글

[Python]한글 깨짐 (0)	2023.08.17
[Python] for문을 활용해서 list, dictionary를 만들 때, 사소한 꿀팁 (0)	2023.02.13
주피터 노트북 한글 깨짐 해결방법 (0)	2022.11.21
[Python] 시퀀스 자료형 (리스트, 튜플, 문자열) (0)	2022.08.03
[Python] 데이터 프레임 행 추가 쉽게 하기 (0)	2022.05.12

현재글[Python] DataFrame에서 null이 들어있는 행만 보고 삭제하기

시그모이드

Github : https://github.com/DrunkJin

250x250

CPP, 자료구조, 코딩, 롤, C++, 씨쁠쁠, Python, 라이엇, 코딩테스트, 백준, 데이터프레임, MVC, 파이썬, riot, 딥러닝, RiotAPI, dataframe, 코테, 프로그래머스, 머신러닝,

Today :
Yesterday :

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

시그모이드