[Pandas 기초] 함수 매핑(mapping)

업데이트: August 07, 2019

함수 매핑(mapping)

함수 매핑이란 시리즈나 데이터프레임의 개별 원소를 특정 함수에 일대일 대응시키는 과정을 의미한다. 사용자가 직접 만든 함수(lambda함수 포함)를 적용헐 수 있기 때문에 for문문이나 기본 함수로 처리하는 것보다 훨씬 효율적인 장점이 있다.

1. 개별 원소에 함수 매핑

1-1.`Series객체.apply()` 함수

이 함수는 시리즈 객체의 개별원소에 함수를 적용할때 사용한다.

seaborn라이브러리의 타이타닉 데이터를 불러와서 age열과 fare열만 인덱싱 한 후 ten이라는 열에 10을 넣어보자.

import seaborn as sns
titanic = sns.load_dataset('titanic')
df = titanic.loc[:, ['age','fare']]
df['ten'] = 10
df.head()

	age	fare	ten
0	22.0	7.2500	10
1	38.0	71.2833	10
2	26.0	7.9250	10
3	35.0	53.1000	10
4	35.0	8.0500	10

그리고 두가지 사용자함수를 정의해보자

def add_10(n):
    return n + 10         # 10을 더해주는 함수

def add_two_obj(a,b):
    return a + b          # a와 b를 더해주는 함수

print(add_n(10))
print(add_two_obj(10,10))

[Output]
20
20

이제 이 함수들을 apply()함수를 이용해 시리즈객체(age열)의 개별원소에 적용시켜보자

sr1 = df['age'].apply(add_10)  # age의 모든 원소 + 10
print(sr1.head())
print('\n')

sr2 = df['age'].apply(add_two_obj, b=10)   # a는 age의 모든 원소, b=10
print(sr2.head())
print('\n')

sr3 = df['age'].apply(lambda x: add_n(x))   # lambda함수 활용
print(sr3.head())

[Output]
  32.0
  48.0
  36.0
  45.0
  45.0
Name: age, dtype: float64

  32.0
  48.0
  36.0
  45.0
  45.0
Name: age, dtype: float64

  32.0
  48.0
  36.0
  45.0
  45.0
Name: age, dtype: float64

1-2. `DataFrame객체.applymap()` 함수

이 함수는 데이터프레임의 개별원소에 함수를 적용할때 사용한다.

df_map = df.applymap(add_10)   #데이터프레임의 모든 원소에 함수를 적용해줌
print(df_map.head())

[Output] 
    age     fare
32.0  17.2500
48.0  81.2833
36.0  17.9250
45.0  63.1000
45.0  18.0500

2. 시리즈 객체에 함수 매핑

2-1. `DataFrame객체.apply(매핑함수, axis = )` 함수

데이터프레임의 열 또는 행에 대해 함수를 적용시키고 싶다면, apply함수를 적용하면 된다.

이번에는 결측값을 확인하는 사용자 함수를 정의해보자

def missing_value(series):
    return series.isnull()

이제 이 함수를 데이터프레임의 열(axis=1)에 매핑해보자

result = df.apply(missing_value, axis = 1)
print(result.head())
print('\n')
print(type(result))

[Output]
     age   fare
0  False  False
1  False  False
2  False  False
3  False  False
4  False  False

<class 'pandas.core.frame.DataFrame'>

이번에는 최대값에서 최소값을 뺀 값을 반환하는 사용자함수를 정의한다.

def min_max(x):
    return x.max() - x.min()

이 함수를 데이터프레임에 행(axis=0), 열(axis=1)에 대하여 연산을 해보자.

result = df.apply(min_max, axis = 0)
result1 = df.apply(min_max, axis = 1)
print(df.head())
print('\n')
print(result.head())
print('\n')
print(result1.head())

[Output]
    age     fare
0  22.0   7.2500
1  38.0  71.2833
2  26.0   7.9250
3  35.0  53.1000
4  35.0   8.0500

age      79.5800
fare    512.3292
dtype: float64

0    14.7500
1    33.2833
2    18.0750
3    18.1000
4    26.9500
dtype: float64

axis옵션이 나는 초반에 많이 헷갈렸다. 헷갈리는 부분을 정리해보면 다음과 같다.

axis = 0 : (생략가능), 위아래(각 행들의 연산), 열에 대한 결과
axis = 1 (생략불가) 양옆(각 열들의 연산), 행에 대한 결과

데이터프레임에서 apply객체의 원리를 lambda함수로 확인을 해보면 x자체는 데이터프레임 임을 알 수 있다.
아래에서 x(df)의 열(age,ten)에 대해 연산을 한 결과를 확인하자

def add_two_obj(a, b):
    return a + b

df['add'] = df.apply(lambda x: add_two_obj(x['age'],x['ten']), axis = 1)
print(df.head())

[Output]
    age     fare  ten
22.0   7.2500   10
38.0  71.2833   10
26.0   7.9250   10
35.0  53.1000   10
35.0   8.0500   10


    age     fare  ten   add
22.0   7.2500   10  32.0
38.0  71.2833   10  48.0
26.0   7.9250   10  36.0
35.0  53.1000   10  45.0
35.0   8.0500   10  45.0

3. 데이터프레임 객체에 함수 매핑

3-1. `DataFrame객체.pipe()` 함수

pipe()함수는 데이터프레임객체에 바로 적용시키는 함수로, 입력인자를 들어가는 매핑함수의 반환형태를 그대로 따라서 값을 반환해준다.

import seaborn as sns
titanic = sns.load_dataset('titanic')
df = titanic.loc[:, ['age','fare']]

# 사용자 정의함수
def missing_value(x):
    return x.isnull()

def missing_count(x):
    return missing_value(x).sum()

def total_number_missing(x):
    return missing_count(x).sum()

순차적으로 적용

result_df = df.pipe(missing_value)
print(result_df.head())
print(type(result_df))
print('\n')


result_sr = df.pipe(missing_count)
print(result_sr.head())
print(type(result_sr))
print('\n')


result_value = df.pipe(total_number_missing)
print(result_value)
print(type(result_value))

[Output]
     age   fare
0  False  False
1  False  False
2  False  False
3  False  False
4  False  False
<class 'pandas.core.frame.DataFrame'>

age     177
fare      0
dtype: int64
<class 'pandas.core.series.Series'>

177
<class 'numpy.int64'>

Reference

도서 [파이썬 머신러닝 판다스 데이터 분석]을 공부하며 작성하였습니다.

Twitter Facebook LinkedIn

Yonggeun Shin

[Pandas 기초] 함수 매핑(mapping)

함수 매핑(mapping)

1. 개별 원소에 함수 매핑

1-1.`Series객체.apply()` 함수

1-2. `DataFrame객체.applymap()` 함수

2. 시리즈 객체에 함수 매핑

2-1. `DataFrame객체.apply(매핑함수, axis = )` 함수

3. 데이터프레임 객체에 함수 매핑

3-1. `DataFrame객체.pipe()` 함수

Reference

공유하기

댓글남기기

참고

[서평] 한가지에 집중하라!, The One Thing - 게리 켈러, 제이 파파산

[Hive] Hive CREATE TABLE 다양한 옵션 정리

[Hive] Hive Table 다루기 (DESCRIBE, CREATE, LOAD, INSERT, DROP, VIEW, CTAS)

[Linux] Linux 기본 명령어 정리

Yonggeun Shin

함수 매핑(mapping)

1. 개별 원소에 함수 매핑

1-1.Series객체.apply() 함수

1-2. DataFrame객체.applymap() 함수

2. 시리즈 객체에 함수 매핑

2-1. DataFrame객체.apply(매핑함수, axis = ) 함수

3. 데이터프레임 객체에 함수 매핑

3-1. DataFrame객체.pipe() 함수

Reference

공유하기

댓글남기기

참고

[서평] 한가지에 집중하라!, The One Thing - 게리 켈러, 제이 파파산

[Hive] Hive CREATE TABLE 다양한 옵션 정리

[Hive] Hive Table 다루기 (DESCRIBE, CREATE, LOAD, INSERT, DROP, VIEW, CTAS)

[Linux] Linux 기본 명령어 정리

1-1.`Series객체.apply()` 함수

1-2. `DataFrame객체.applymap()` 함수

2-1. `DataFrame객체.apply(매핑함수, axis = )` 함수

3-1. `DataFrame객체.pipe()` 함수