Prepare Your Iterables – Python Iterables Part 2 of N

บทความที่แล้วได้กล่าวถึง for statement ของภาษา Python ซึ่งจะต้องใช้งานควบคู่กับวัตถุประเภท iterable เท่านั้น

บทความนี้จะพูดถึงเทคนิคการจัดการกับลำดับของข้อมูล ( ⁠sequence of data ⁠) ที่จะช่วยให้ for statement อ่านเข้าใจง่าย ดูสละสลวยขึ้น และยังมีประสิทธิภาพในการทำงานที่ดีอีกด้วย

เราจะสนใจโจทย์ปัญหาต่อไปนี้เพื่อทำให้เรื่องราวที่จะนำเสนอเป็นรูปธรรมยิ่งขึ้น

ปัญหา ก. การคำนวณหา local maxima ในข้อมูลตารางสองมิติ

ปัญหา

สมมติว่ามีข้อมูลเป็นตารางสองมิติ โดยที่แต่ละช่องมีค่าเป็นจำนวนจริง

นิยาม: ข้อมูลในช่องหนึ่ง ๆ จะจัดว่าเป็น local maximum ถ้าช่องดังกล่าวมีค่ามากกว่าข้อมูลในช่องที่อยู่ติดกันในแนวตั้ง ในแนวนอน หรือในแนวทแยงมุม ( ⁠เรียกรวม ๆ ว่าเป็น “ช่องเพื่อนบ้าน” ⁠)

จงหาพิกัดของค่า local maximamaxima เป็นคำพหูพจน์ของคำว่า maximum ทุกช่องที่ปรากฏในข้อมูลตารางที่กำหนดให้ ให้ตอบพิกัดซึ่งเป็นค่าดัชนีแนวตั้ง ( ⁠row index ⁠) และแนวนอน ( ⁠column index ⁠) ของแต่ละช่อง

เพื่อความสะดวกและรวดเร็ว จะขอกำหนดให้ข้อมูลตารางอยู่ในรูปของ numpy.ndarray สองมิติ แต่วิธีการแก้ปัญหาที่จะนำเสนอต่อจากนี้ไป จะไม่จำเพาะเจาะจงกับไลบรารี่ numpy เพียงอย่างเดียว ผู้อ่านไม่จำเป็นต้องมีประสบการณ์ในการใช้งานไลบรารี่ดังกล่าวมาก่อนแต่อย่างใด

เราจะยกตัวอย่างข้อมูลตารางสองมิติ ในตัวแปรชื่อ grid ซึ่งมีค่าดังต่อไปนี้

import numpy as np

grid = np.array([[-4,  5, -2,  0],
                 [ 8,  6,  9,  1],
                 [-2, -3,  5,  2],
                 [-3,  5,  4,  6]])

ข้อมูลตาราง grid ในตัวอย่างข้างต้น จะมี local maxima อยู่ 3 ช่องซึ่งปรากฏอยู่ที่ตำแหน่ง ( ⁠1, 0 ⁠), ( ⁠1, 2 ⁠) และ ( ⁠3, 3 ⁠) ตามลำดับ

ช่องตาราง grid[ ⁠1, 0 ⁠] มีค่าเท่ากับ 8 จัดว่าเป็น local maximum เพราะมีมูลค่ามากกว่าช่องเพื่อนบ้านทั้งหมดที่อยู่ติดกัน ( ⁠อันได้แก่ -4, 5, 6, -3 และ -2 เมื่อพิจารณาช่องเพื่อนบ้านแต่ละช่องเริ่มจากช่องบน วนตามเข็มนาฬิกา ⁠)
ช่องตาราง grid[ ⁠1, 2 ⁠] มีค่าเท่ากับ 9 จัดว่าเป็น local maximum อีกช่องหนึ่ง เพราะมีมูลค่ามากกว่าช่องเพื่อนบ้านทั้งแปดช่องที่อยู่ติดกัน ( ⁠อันได้แก่ -2, 0, 1, 2, 5, -3, 6 และ 5 เมื่อพิจารณาช่องเพื่อนบ้านเริ่มจากช่องบน วนตามเข็มนาฬิกา ⁠)
ช่องตาราง grid[ ⁠3, 3 ⁠] มีค่าเท่ากับ 6 จัดว่าเป็น local maximum เพราะมีมูลค่ามากกว่าช่องที่อยู่ด้านบนทางซ้าย ( ⁠ซึ่งมีมูลค่า 2 ⁠), ช่องซ้าย ( ⁠มูลค่า 4 ⁠) และช่องบนซ้าย ( ⁠มูลค่า 5 ⁠)

ส่วนช่องอื่น ๆ ไม่จัดว่าเป็น local maximum เพราะอาจจะมีช่องเพื่อนบ้านบางช่องที่มีมูลค่ามากกว่าหรือเท่ากับช่องข้อมูลนั้น ๆ

เป้าหมาย: ต่อจากนี้ไปจะขอนำเสนอฟังก์ชันในภาษา Python ที่จะช่วยแก้ปัญหา ก. ข้างต้น โดยเราจะเริ่มพิจารณาจากโค้ดที่ตรงไปตรงมาที่สุด ( ⁠แต่อาจจะมีจุดบกพร่องบ้าง ⁠) แล้วเราจะพยายามปรับปรุงโค้ดดังกล่าวทีละนิด จนสุดท้ายกลายเป็นโค้ดที่อ่านง่าย ดูสละสลวย และมีประสิทธิภาพที่ดี

Version 1 – Naïve approach

โค้ดต่อไปนี้นำเสนอการคำนวณ local maxima จากข้อมูลตารางโดยใช้วิธีที่ถึกอย่างตรงไปตรงมาที่สุด ( ⁠มีคำอธิบายประกอบอยู่ด้านล่างโค้ด ⁠)

from typing import List, Tuple
import numpy as np

def local_maxima(table: np.ndarray) -> List[Tuple[int, int]]:
    """
    Compute a list of table indices where each value is a local maximum
    in the given table data when comparing to adjacent table cells
    (either horizontally, vertically, or diagonally).
    """
    # data.shape describes the number of rows and columns of the table
    n_rows, n_cols = table.shape
    local_maxima_indices = []

    for i in range(n_rows):
        for j in range(n_cols):
            if i > 0 and j > 0 and table[i-1, j-1] >= table[i, j]:
                continue  # invalidated by above-left cell
            if i > 0 and table[i-1, j] >= table[i, j]:
                continue  # invalidated by above cell
            if i > 0 and j < n_cols-1 and table[i-1, j+1] >= table[i, j]:
                continue  # invalidated by above right cell
            if j > 0 and table[i, j-1] >= table[i, j]:
                continue  # invalidated by left cell
            if j < n_cols-1 and table[i, j+1] >= table[i, j]:
                continue  # invalidated by right cell
            if i < n_rows-1 and j > 0 and table[i+1, j-1] >= table[i, j]:
                continue  # invalidated by below left cell
            if i < n_rows-1 and table[i+1, j] >= table[i, j]:
                continue  # invalidated by below cell
            if i < n_rows-1 and j < n_cols-1 and table[i+1, j+1] >= table[i, j]:
                continue  # invalidated by below right cell

            # Append the cell if survived checks until this point
            local_maxima_indices.append((i, j))

    return local_maxima_indices

ตัวอย่างการเรียกใช้งานฟังก์ชัน local_maxima ( ⁠Version 1 ⁠) ข้างต้น

>>> local_maxima(grid)
[(1, 0), (1, 2), (3, 3)]

โอ้แม่เจ้า ถึกเสียนี่กะไร! สังเกตว่าเรามี if statement ทั้งสิ้น 8 ชุดเพื่อตรวจสอบว่าค่าของแต่ละช่อง ( ⁠i, j ⁠) นั้นมีค่ามากกว่าหรือเท่ากับช่องเพื่อนบ้านที่อยู่ติดกันทั้ง 8 ช่องหรือไม่ ( ⁠โดยเริ่มต้นเช็คกับช่องเพื่อนบ้านทางมุมบนซ้าย จนไปสิ้นสุดที่ช่องที่มุมล่างขวา ⁠) หากผิดไปจากนี้แปลว่าค่าของช่อง ( ⁠i, j ⁠) จะไม่มีทางเป็น local maxima อย่างแน่นอน

ความเลวร้ายยังไม่หยุดแค่นั้น! ในกรณีที่ช่อง ( ⁠i, j ⁠) อยู่บนขอบตาราง จะต้องระมัดระวังไม่นำค่าของช่องดังกล่าวไปเปรียบเทียบกับค่าของช่องที่หลุดขอบตารางออกไป เช่น ถ้า j == 0 แล้วจะไม่สามารถเปรียบเทียบค่าของช่อง ( ⁠i, j ⁠) กับช่อง ( ⁠i, j-1 ⁠) ที่อยู่ทางซ้ายมือได้ ( ⁠จึงเป็นที่มาของการตรวจเช็คเงื่อนไขในรูปของ i > 0, i < r-1, j > 0 และ j < c-1 ในโค้ดข้างต้น ⁠)

แม้ว่าโค้ดข้างต้นนี้จะมีลอจิกที่เข้าใจง่าย แต่ก็ไม่ได้หมายความว่าจะเป็นโค้ดที่อ่านให้เข้าใจได้อย่างรวดเร็วทันทีนัก และยังมีโอกาสเสี่ยงต่อการพิมพ์ผิดเล็ก ๆ น้อย ๆ อีกด้วย ( ⁠เอ๊ะ? มีหรือเปล่านะ? ⁠) นอกจากนี้แล้วถ้าวันหนึ่งเราจะต้องเขียนฟังก์ชัน local_minima ซึ่งทำงานคล้ายกับ local_maxima แต่ต้องคำนวณหาตำแหน่งของ local minima ทั้งหมดในตารางแทนหละ เราจะก็อปโค้ดนี้แล้วมาแก้เครื่องหมาย >= ให้เป็น <= ทีละบรรทัด ก็ไม่ใช่วิธีที่สวยงามเท่าไหร่นัก

Version 2 – Local 3 × 3 grid approach

จุดสังเกตแรกที่จะช่วยทำให้โค้ดกระชับขึ้นก็คือ เงื่อนไขการเปรียบเทียบ data[ ⁠i, j ⁠] กับช่องเพื่อนบ้านที่อยู่ติดกันมี pattern ที่คล้ายคลึงกัน ( ⁠ทุกอสมการอยู่ในรูปของ data[ ⁠... ⁠] >= data[ ⁠i, j ⁠] ⁠) เราสามารถใช้จุดสังเกตดังกล่าวมาเขียนโค้ดให้ง่ายขึ้นได้ดังนี้

from typing import List, Tuple
import numpy as np

def local_maxima(table: np.ndarray) -> List[Tuple[int, int]]:
    """
    Compute a list of table indices where each value is a local maximum
    in the given table data when comparing to adjacent table cells
    (either horizontally, vertically, or diagonally).
    """
    n_rows, n_cols = table.shape
    local_maxima_indices = []

    for i in range(n_rows):
        for j in range(n_cols):
            is_candidate = True  # whether (i, j) is still a valid local maximum
            for nbr_i in [i-1, i, i+1]:
                for nbr_j in [j-1, j, j+1]:
                    # Check first if neighbor (nbr_i, nbr_j) really exists
                    if (nbr_i, nbr_j) == (i, j):
                        continue  # not really a neighbor
                    if not (0 <= nbr_i < n_rows and 0 <= nbr_j < n_cols):
                        continue  # neighbor out of bounds
                    # Comparison validation as before
                    if table[nbr_i, nbr_j] >= table[i, j]:
                        is_candidate = False
            if is_candidate:
                local_maxima_indices.append((i, j))

    return local_maxima_indices

การเรียกใช้งานฟังก์ชัน local_maxima ( ⁠Version 2 ⁠) ยังคงให้ผลลัพธ์เหมือนเดิม

>>> local_maxima(grid)
[(1, 0), (1, 2), (3, 3)]

กล่าวคือเราใช้ ( ⁠nbr_i, nbr_j ⁠) แทนตำแหน่งของช่องแต่ละช่องในตารางย่อย 3 × 3 ที่อยู่ในละแวกเพื่อนบ้านของช่องหลัก ( ⁠i, j ⁠) โดยใช้ nested for loop ชั้นที่ 3 และ 4 ในโค้ดข้างต้น

ข้อพึงระวัง: เนื่องจากฟังก์ชัน local_maxima ( ⁠Version 2 ⁠) ของเราจะพิจารณาตำแหน่งช่อง ( ⁠nbr_i, nbr_j ⁠) ทั้งหมด 9 ช่องจากเดิมแค่ 8 ช่อง เราจึงต้องระมัดระวังไม่นำเคสที่ ( ⁠nbr_i, nbr_j ⁠) == ( ⁠i, j ⁠) มาพิจารณา

Version 3 – Prepare your iterables

แม้ว่าโค้ดใน Version 2 นี้จะปรับปรุงให้กระชับขึ้นมากกว่าโค้ดใน Version 1 แล้ว แต่ก็ทำให้โค้ดบางส่วนอ่านยากขึ้น เพราะมี for statement ซ้อนกันถึง 4 ชั้นเลยทีเดียว

เราจะจัดการกับโค้ดเจ้าปัญหาดังกล่าวให้อ่านง่ายขึ้น ด้วยการ refactor ลอจิกบางส่วนออกจากฟังก์ชันหลักอย่าง local_maxima ให้กลายเป็นฟังก์ชันแยกต่างหาก กล่าวคือเราจะมีฟังก์ชันใหม่อันหนึ่งที่เอาไว้ช่วยไล่เรียงช่อง ( ⁠i, j ⁠) ทุกช่องในตาราง และอีกฟังก์ชันหนึ่งเอาไว้ช่วยไล่เรียงช่องเพื่อนบ้าน ( ⁠nbr_i, nbr_j ⁠) ทุกช่องที่อยู่ติดกับช่อง ( ⁠i, j ⁠) ที่กำหนดให้

โปรแกรมเวอร์ชันใหม่ของเรามีหน้าตาดังต่อไปนี้

from typing import List, Tuple
import numpy as np

def local_maxima(table: np.ndarray) -> List[Tuple[int, int]]:
    """
    Compute a list of table indices where each value is a local maximum
    in the given table data when comparing to adjacent table cells
    (either horizontally, vertically, or diagonally).
    """
    local_maxima_indices = []
    for pos in table_cells(table.shape):
        is_candidate = True
        for nbr_pos in adjacent_cells(table.shape, pos):
            if table[nbr_pos] >= table[pos]:
                is_candidate = False
        if is_candidate:
            local_maxima_indices.append(pos)

    return local_maxima_indices

def table_cells(shape: Tuple[int, int]) -> List[Tuple[int, int]]:
    """
    A list of indices for all cells in a table with a given shape
    which is a tuple of number of rows and columns, respectively.
    """
    n_rows, n_cols = shape
    return [(i, j) for i in range(n_rows) for j in range(n_cols)]

def adjacent_cells(
        shape: Tuple[int, int],
        main_pos: Tuple[int, int],
) -> List[Tuple[int, int]]:
    """
    A list of indices of all adjacent cells to the cell indicated by
    the provided main pos, subject to the given shape of the table.
    """
    n_rows, n_cols = shape
    i, j = main_pos

    adjacent_indices = []
    for nbr_i in [i-1, i, i+1]:
        for nbr_j in [j-1, j, j+1]:
            if (nbr_i, nbr_j) == (i, j):
                continue
            if 0 <= nbr_i < n_rows and 0 <= nbr_j < n_cols:
                adjacent_indices.append((nbr_i, nbr_j))

    return adjacent_indices

เมื่อเทียบโค้ดของฟังก์ชัน local_maxima เวอร์ชันใหม่กับเวอร์ชันที่แล้ว ( ⁠Version 2 ⁠) มีสิ่งที่เปลี่ยนแปลงไปดังนี้

ช่อง ( ⁠i, j ⁠) แต่ละช่องของตารางจะถูกสร้างโดยฟังก์ชัน table_cells แทนที่จะเขียน for statement ซ้อนกัน 2 ชั้นขึ้นมาโดยตรง
ช่องเพื่อนบ้าน ( ⁠nbr_i, nbr_j ⁠) แต่ละช่องที่อยู่ติดกับช่อง ( ⁠i, j ⁠) ซึ่งไม่ตกขอบตาราง จะถูกสร้างโดยฟังก์ชัน adjacent_cells แทนที่จะใช้ for statement ซ้อนกันอีก 2 ชั้นผสมกับ if statement เพิ่มเติม

สังเกตว่าโค้ดของฟังก์ชัน local_maxima ( ⁠Version 3 ⁠) นี้กระชับและอ่านง่ายขึ้นกว่าเดิม เราสามารถมองเห็น intention ของฟังก์ชันได้จากโค้ดโดยตรง โดยเราไม่จำเป็นต้องมานั่งตีความอีกต่อไปว่า for statement ที่ซ้อนกันแต่ละชั้น มีจุดประสงค์เอาไว้ทำงานอะไรบ้าง

นอกจากนั้น ฟังก์ชัน table_cells และ adjacent_cells ที่แยกออกมานั้น สามารถนำไปใช้ประโยชน์ซ้ำในโอกาสอื่น ๆ ได้ ( ⁠เช่น นำไปใช้เขียนฟังก์ชัน local_minima เป็นต้น ⁠)

เมื่อเราลองใช้งาน local_maxima ( ⁠Version 3 ⁠) ก็พบว่าโปรแกรมยังให้ผลลัพธ์เหมือนเดิม

>>> local_maxima(grid)
[(1, 0), (1, 2), (3, 3)]

เอาหละ! มาพักหายใจกันแป๊บหนึ่ง เดี๋ยวมาดูกันว่าเราจะทำให้โค้ด Version 3 ข้างต้นให้ดียิ่งขึ้นได้อย่างไรบ้าง

[ ⁠insert elevator song here ⁠]

Version 4 Part A: Improving main function

เราจะใช้เครื่องมือบางอย่างที่ Python จัดเตรียมไว้เพื่อทำให้ฟังก์ชัน local_maxima สั้นลงกว่าเดิม มาพิจารณาโค้ดเวอร์ชันใหม่นี้กันเลย

from typing import List, Tuple
import numpy as np

def local_maxima(table: np.ndarray) -> List[Tuple[int, int]]:
    """
    Compute a list of table indices where each value is a local maximum
    in the given table data when comparing to adjacent table cells
    (either horizontally, vertically, or diagonally).
    """
    local_maxima_indices = []
    for pos in table_cells(table.shape):
        if all(table[pos] > table[nbr_pos]
               for nbr_pos in adjacent_cells(table.shape, pos)):
            local_maxima_indices.append(pos)

    return local_maxima_indices

เมื่อพิจารณาฟังก์ชัน local_maxima เวอร์ชันล่าสุดนี้เทียบกับ Version 3 จะพบว่าเราได้ย้าย for statement ชั้นในสำหรับตัวแปร nbr_pos เข้าไปอยู่ภายใน generator expression ต่อไปนี้

# The code below is a generator expression
(
    table[pos] > table[nbr_pos]
    for nbr_pos in adjacent_cells(table.shape, pos)
)

จากนั้น generator expression ดังกล่าวก็ถูกป้อนเป็น input argument ของฟังก์ชัน all ( ⁠ซึ่งเป็นฟังก์ชันที่สามารถเรียกใช้งานได้โดยไม่จำเป็นต้อง import ก่อน ⁠) หากพิจารณา documentation ของฟังก์ชัน all จะพบว่า input argument สามารถเป็นวัตถุ iterable ใดก็ได้ รวมถึง generator expression ที่เราเขียนไว้ข้างต้นด้วย

สิ่งที่น่าสนใจเกี่ยวกับโค้ด Version 4 a นี้ก็คือ เมื่ออ่านตาม syntax ของโค้ดข้างต้นจะได้ความว่า

“ให้พิจารณาช่อง pos แต่ละตำแหน่งในตาราง แล้วหาว่ามีช่องใดบ้างที่มีค่ามากกว่าช่องเพื่อนบ้าน nbr_pos ทุกช่อง”

ซึ่งมีความหมายสอดคล้องและคล้ายคลึงกับนิยามของโจทย์ที่กำหนดไว้ในช่วงต้นของบทความนี้อย่างมาก

Version 4 Part B. Using generator expression with `table_cells`

ในฟังก์ชัน local_maxima ( ⁠Version 3 ⁠) สังเกตว่าเราเรียกใช้ฟังก์ชัน table_cells ( ⁠Version 3 ⁠) ซึ่งให้ผลลัพธ์ออกมาเป็น list ของตำแหน่งทุกช่องในตาราง แล้วจากนั้นเราจึงไล่ดูช่องตารางทีละช่องโดยจาก list ดังกล่าวโดยใช้ for statement

แต่จากบทความที่แล้วเราได้กล่าวไว้ว่า วัตถุที่ใช้คู่กับ for statement สามารถเป็นวัตถุ iterable ใดก็ได้ ไม่จำเป็นต้องเป็นโครงสร้างข้อมูลอย่างเช่น list เสมอไป ฉะนั้นแล้วเราจะลองแก้ไขฟังก์ชัน table_cells ให้รีเทิร์นค่าออกมาเป็นวัตถุ iterator โดยเปลี่ยนจาก list comprehension ให้กลายเป็น generator expression ซึ่งออกมาเป็นโค้ดดังต่อไปนี้

from typing import Iterator, Tuple
import numpy as np

def table_cells(shape: Tuple[int, int]) -> List[Tuple[int, int]]:
def table_cells(shape: Tuple[int, int]) -> Iterator[Tuple[int, int]]:
    """
    An iterator of indices for all cells in a table with a given shape
    which is a tuple of number of rows and columns, respectively.
    """
    n_rows, n_cols = shape
    return [(i, j) for i in range(n_rows) for j in range(n_cols)]
    return ((i, j) for i in range(n_rows) for j in range(n_cols))

แล้วการเขียนฟังก์ชันที่รีเทิร์นค่าเป็นวัตถุ iterator มันดีกว่าการสร้าง list โดยตรงอย่างไรบ้างในสถานการณ์เช่นนี้?

ฟังก์ชันที่สร้างวัตถุ iterator นั้นให้อิสระแก่ผู้เรียกใช้งานฟังก์ชันมากกว่าการสร้าง list นั่นเพราะว่าเราสามารถแปลงวัตถุ iterator ดังกล่าวให้กลายเป็น list ได้ด้วยการแคสด้วย list constructor ได้เสมอ ( ⁠list จะแปลงข้อมูล iterable ใด ๆ ที่รับมาให้กลายเป็น list ตามที่ระบุไว้ใน documentation ⁠)

ยกตัวอย่างเช่น
Python REPL
```
>>> shape = (2, 3)
>>> table_cells(shape)
<generator object table_cells.<locals>.<genexpr> at 0x7fc0dec0ffee>
>>> list(table_cells(shape))
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
```
เมื่อพิจารณาในกรณีที่โปรแกรมเราใช้งาน for statement กับวัตถุ iterable เพื่อประมวลผลสมาชิกเพียงครั้งละ 1 ตัวเท่านั้น ( ⁠ดังเช่นที่เกิดขึ้นภายในฟังก์ชัน local_maxima Version 3 ซึ่งเราจะพิจารณาช่องตารางเพียงทีละ 1 ช่องเท่านั้น ⁠)

หมายความว่าเราไม่จำเป็นต้องคำนวณตำแหน่งของช่องตารางทุกช่องไว้ล่วงหน้าทั้งหมด ซึ่งยังมีผลให้เปลืองพื้นที่หน่วยความจำเพื่อเก็บลำดับของช่องตารางทั้งหมดดังกล่าวอีกด้วย ( ⁠ลองคิดดูว่าหากข้อมูลตารางมีขนาดใหญ่เช่น $10,\!000 \times 10,\!000$ เราจะเสียพื้นที่ไปกี่ MB โดยไม่มีเหตุจำเป็น ⁠)

ฉะนั้นแล้วเราเลือกที่จะไม่สร้างลำดับของช่องตารางทั้งหมดไว้ล่วงหน้า แต่จะค่อย ๆ สร้างลำดับดังกล่าวอย่างขี้เกียจที่สุด ( ⁠เรียกว่าเป็น lazily constructed sequence ⁠) กล่าวคือเราจะคำนวณสมาชิกตัวถัดไปของลำดับนี้ก็ต่อเมื่อถึงเวลาที่ for statement ขอเรียกดูสมาชิกตัวดังกล่าวไปประมวลผลเท่านั้น ซึ่งการเขียน generator expression ก็เป็นหนึ่งวิธีที่จะสร้าง lazily constructed sequence อย่างที่ปรารถนาได้

บทแทรก

ข้อตกลง ( ⁠convention ⁠) ของภาษา Python กำหนดไว้ว่า วัตถุประเภท iterator ถือว่าเป็น iterable ชนิดหนึ่งที่สามารถไล่เรียกดูสมาชิกทุกตัวได้เพียงรอบเดียวเท่านั้น ( ⁠เพราะในทางเทคนิคแล้ว method ชื่อ __iter__( ⁠self ⁠) มักจะรีเทิร์น self ออกมาโดยตรง ทำให้ไม่เกิด iterator อันใหม่สำหรับวัตถุที่เป็น iterator อยู่แล้วแต่แรก ⁠) ผู้ใช้งานจึงควรระมัดระวังการใช้งานวัตถุ iterator ซ้ำกันเกิน 1 รอบ ยกตัวอย่างเช่น

>>> shape = (2, 3)
>>> iterator = table_cells(shape)
>>> list(iterator)
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
>>> list(iterator)  # iterator has previously been fully consumed
[]

หมายเหตุ: อันที่จริงแล้ว เราอาจเรียกใช้ฟังก์ชัน numpy.ndindex( ⁠*shape ⁠) แทนการสร้างและเรียกใช้งานฟังก์ชัน table_cells( ⁠shape ⁠) ก็ได้

Version 4 Part C. Using generator function for `adjacent_cells`

เมื่อสักครู่เราแก้ไขฟังก์ชัน table_cells จากเดิมที่รีเทิร์นค่าเป็น list ให้กลายเป็นวัตถุ iterator ใน Version 4 b กันไปแล้ว เราจะนำไอเดียเดียวกันนี้มาประยุกต์ใช้กับฟังก์ชัน adjacent_cells ดูบ้าง แต่ถ้าจะเปลี่ยนโค้ดของ adjacent_cells ให้หันมาใช้ generator expression ก็คงยุ่งยากวุ่นวายเกินไป ดังนั้นเราจะใช้ความสามารถอีกอย่างหนึ่งของภาษา Python นั่นก็คือ generator function

เราจะข้ามไปดูโค้ดเวอร์ชันใหม่ของ adjacent_cells กันเลย

from typing import Iterator, Tuple
import numpy as np

def adjacent_cells(
        shape: Tuple[int, int],
        main_pos: Tuple[int, int],
) -> List[Tuple[int, int]]:
) -> Iterator[Tuple[int, int]]:
    """
    An iterator of indices of all adjacent cells to the cell indicated by
    the provided main pos, subject to the given shape of the table.
    """
    n_rows, n_cols = shape
    i, j = main_pos

    adjacent_indices = []
    for nbr_i in [i-1, i, i+1]:
        for nbr_j in [j-1, j, j+1]:
            if (nbr_i, nbr_j) == (i, j):
                continue
            if 0 <= nbr_i < n_rows and 0 <= nbr_j < n_cols:
                adjacent_indices.append((nbr_i, nbr_j))
                yield nbr_i, nbr_j

สิ่งที่เปลี่ยนไปจากโค้ดเดิมคือ

เราลบโค้ดส่วนที่เกี่ยวข้องกับลิสต์ adjacent_indices ออก และ
แทนที่เราจะ append ค่าของ ( ⁠nbr_i, nbr_j ⁠) ยัดลงใน list เราเลือกใช้ yield statement เพื่อคืนค่าตำแหน่งเหล่านั้นทีละตัวแทน

“แล้วคำสั่ง yield มันทำอะไร?” อธิบายโดยคร่าวคือ

เมื่อฟังก์ชันใดก็ตามมี keyword คำว่า yield ปรากฏอยู่ จะทำให้ฟังก์ชันดังกล่าวนั้นกลายร่างเป็น “generator function” ในทันที
การเรียกใช้งาน generator function จะให้ผลลัพธ์เป็น “generator object” ซึ่งจัดเป็นวัตถุ iterator ชนิดหนึ่ง แต่ว่า body ของฟังก์ชันดังกล่าวจะยังไม่เริ่มทำงานในทันที อันที่จริงแล้วเมื่อเรารัน generator expression ( ⁠ดังเช่นในฟังก์ชัน table_cell Version 4–B ⁠) ก็จะได้ผลลัพธ์เป็น generator object เช่นเดียวกัน
เมื่อมีการเริ่มไล่เรียกดูสมาชิกแต่ละตัวของวัตถุ generator/iterator ( ⁠ไม่ว่าจะเกิดจากการใช้งานวัตถุดังกล่าวกับ for statement หรือด้วยกระบวนการอื่นใดก็ตาม ⁠) แล้ว generator function ดังกล่าวจึงจะเริ่มทำงาน
และทุก ๆ ครั้งที่ generator function ทำงานถึงบรรทัดคำสั่ง yield ค่าของ นิพจน์ ใน yield statement จะถูกรีเทิร์นออกไปเป็นหนึ่งค่าของวัตถุ generator/iterator ส่วน generator function นี้จะหยุดการทำงานชั่วคราว จนกว่าจะมีการเรียกขอดูค่าถัดไปของวัตถุ generator/iterator นี้
วัตถุ generator/iterator จะสิ้นสุดลงก็ต่อเมื่อเกิดจากรีเทิร์นออกจากฟังก์ชัน ( ⁠ไม่ว่าจะด้วย return statement หรือว่าฟังก์ชันถูกรันจนจบถึงบรรทัดสุดท้ายก็ตาม ⁠)

ดังนั้นแล้ว ฟังก์ชัน adjacent_cells ( ⁠Version 4 c ⁠) นี้ก็จะสร้างวัตถุ iterator ซึ่งจะไล่เรียงช่องเพื่อนบ้านทุกอย่างที่อยู่ติดกับช่อง main_pos ที่รับเข้ามาเป็นข้อมูลนำเข้า ออกมาทีละช่องผ่านคำสั่ง yield นับว่าเป็น lazily constructed sequence เช่นเดียวกับฟังก์ชัน table_cells ( ⁠Version 4 b ⁠)

Summary of Version 4

เรานำโค้ด Version 4 ของทั้งสามฟังก์ชันมาประกอบรวมกัน กลายเป็นโปรแกรมดังต่อไปนี้

from typing import Iterator, List, Tuple
import numpy as np

def local_maxima(table: np.ndarray) -> List[Tuple[int, int]]:
    """
    Compute a list of table indices where each value is a local maximum
    in the given table data when comparing to adjacent table cells
    (either horizontally, vertically, or diagonally).
    """
    local_maxima_indices = []
    for pos in table_cells(table.shape):
        if all(table[pos] > table[nbr_pos]
               for nbr_pos in adjacent_cells(table.shape, pos)):
            local_maxima_indices.append(pos)

    return local_maxima_indices

def table_cells(shape: Tuple[int, int]) -> Iterator[Tuple[int, int]]:
    """
    An iterator of indices for all cells in a table with a given shape
    which is a tuple of number of rows and columns, respectively.
    """
    n_rows, n_cols = shape
    return ((i, j) for i in range(n_rows) for j in range(n_cols))

def adjacent_cells(
        shape: Tuple[int, int],
        main_pos: Tuple[int, int],
) -> Iterator[Tuple[int, int]]:
    """
    An iterator of indices of all adjacent cells to the cell indicated by
    the provided main pos, subject to the given shape of the table.
    """
    n_rows, n_cols = shape
    i, j = main_pos

    for nbr_i in [i-1, i, i+1]:
        for nbr_j in [j-1, j, j+1]:
            if (nbr_i, nbr_j) == (i, j):
                continue
            if 0 <= nbr_i < n_rows and 0 <= nbr_j < n_cols:
                yield nbr_i, nbr_j

[ ⁠insert bogo advert here ⁠]

ช่วงโปรโมชันพิเศษ เราขอแถมโจทย์ปัญหาอีก 1 ข้อ

ปัญหา ข. ตารางสองมิติมี local maximum อย่างน้อยหนึ่งช่องหรือไม่

ปัญหา

สมมติว่ามีข้อมูลเป็นตารางสองมิติ โดยที่แต่ละช่องมีค่าเป็นจำนวนจริง

จงหาว่าในข้อมูลตารางนี้ มีช่องอย่างน้อน 1 ช่องที่มีค่าเป็น local maximum หรือไม่ โดยให้ตอบเป็นค่าบูลีน True หรือ False

หมายเหตุ: โปรดสังเกตว่า มีกรณีที่คำตอบของคำถามข้างนี้สามารถเป็น False เช่น ในกรณีที่ตารางมีขนาดใหญ่กว่า 1 × 1 และทุกช่องในตารางมีค่าเท่ากัน

สังเกตว่าปัญหา ข. นี้ถามหา “existence of a solution” ซึ่งเป็นเพียง corollary ของปัญหา ก. ดั้งเดิมที่ให้ “enumerate all solutions” เท่านั้นเอง

สมมติว่ามีข้อมูลตารางอยู่ในตัวแปรชื่อ table แล้วต้องการค้นหาว่า table มี local maximum อย่างน้อยหนึ่งช่องหรือไม่ เราสามารถแก้ปัญหาง่าย ๆ ได้ดังนี้

สั่งรัน local_maxima( ⁠table ⁠) เพื่อคำนวณหาลิสต์ของ local maxima ทั้งหมดของข้อมูลในตัวแปร table
ตรวจสอบว่าผลลัพธ์จากข้อที่แล้ว เป็นสิลต์ที่มีความยาวอย่างน้อย 1 หรือไม่

# Note: In Python, a list evaluates to True under an if statement
# if and only if the list is non-empty.
if local_maxima(table):  # from version 4
    print("The table contains at least 1 local maximum.")
else:
    print("The table has no local maxima.")

แต่ประสิทธิภาพของวิธีการแก้ปัญหาข้างต้นนี้ยังไม่เป็นที่น่าพอใจ กล่าวคือโจทย์ข้อนี้เราต้องการทราบแค่ว่า “มีคำตอบอย่างน้อย 1 ช่องหรือไม่” เพียงแค่นั้น แต่ฟังก์ชัน local_maxima ที่เราเรียกใช้งานกลับรีเทิร์นค่าเป็น “ลิสต์ของคำตอบทั้งหมด” ซึ่งเป็นข้อมูลที่ มากเกินความจำเป็น ในการตอบคำถาม

จะดีหรือไม่ หากเราสามารถควบคุมการทำงานฟังก์ชัน local_maxima สร้างลำดับของช่องตารางที่เป็น local maximum ที่มีลักษณะเป็น lazily constructed sequence เช่นเดียวกับฟังก์ชัน table_cells และ adjacent_cells ( ⁠หมายความว่า ช่องตารางที่มีค่าเป็น local maximum แต่ละช่องจะถูกคำนวณก็ต่อเมื่อถึงเวลาที่ช่องดังกล่าวถูกเรียกดูเท่านั้น ⁠)

Version 5 – Changing return type of main function

ก่อนที่จะพูดถึงจุดประสงค์ในการดัดแปลงฟังก์ชัน local_maxima ให้รีเทิร์นค่าวัตถุ iterator แทนที่จะเป็น list แบบเก่า เราจะไปดูโค้ดที่มีการแก้ไขกันก่อน

from typing import Iterator, Tuple
import numpy as np

def local_maxima(table: np.ndarray) -> List[Tuple[int, int]]:
def local_maxima(data: np.ndarray) -> Iterator[Tuple[int, int]]:
    """
    An iterator of table indices where each value is a local maximum
    in the given table data when comparing to adjacent table cells
    (either horizontally, vertically, or diagonally).
    """
    local_maxima_indices = []
    for pos in table_cells(data.shape):
        if all(data[pos] > data[nbr_pos]
               for nbr_pos in adjacent_cells(data.shape, pos)):
            local_maxima_indices.append(pos)
            yield pos

    return local_maxima_indices

# table_cells and adjacent_cells remain unchanged

เราดัดแปลงให้ฟังก์ชัน local_maxima กลายเป็น generator function ใน Version 5 นี้ ( ⁠ด้วยกระบวนการที่คล้ายกับการดัดแปลงฟังก์ชัน adjacent_cells ใน Version 4 c ⁠)

หมายเหตุ: อย่าลืมว่าถ้าเราต้องการผลลัพธ์เป็นลิสต์แบบเก่า เราก็เพียงแค่นำวัตถุ iterator ไปใส่ใน list constructor ได้ดังที่เคยกล่าวไปแล้ว

แล้วการทำให้ฟังก์ชัน local_maxima รีเทิร์นเป็นวัตถุ iterator จะช่วยให้ตอบปัญหา ข. อย่างมีประสิทธิภาพได้อย่างไร? ลองพิจารณาจิ๊กซอว์สองชิ้นนี้

เราต้องการทราบเพียงแค่ว่าตารางข้อมูลสองมิติที่กำหนดให้ มี local maximum อย่างน้อย 1 ช่องหรือไม่ นั่นแปลว่าหากเราพบ local maximum ช่องแรกของตารางแล้ว ค่า local maximum ที่เหลือในตารางก็ไม่มีนัยยะสำคัญอีกต่อไปนี้
ฟังก์ชัน local_maxima ( ⁠Version 5 ⁠) เป็น generator function ที่จะค้นหาตำแหน่งของ local maximum ในข้อมูลตารางที่รับเป็นข้อมูลนำเข้า เพียงทีละ 1 ช่องเท่านั้น และจะมีหยุดการทำงานชั่วคราวจนกว่าจะมีการขอให้ดำเนินการค้นหา local maximum ตัวถัดไป

ดังนั้นแล้ว เราขอตั้งปัญหาในรูปทั่วไปดังต่อไปนี้

ปัญหาย่อย: ตรวจสอบว่าวัตถุ iterable ไม่ว่างเปล่า

ปัญหา

เราจะสามารถตรวจสอบว่าวัตถุ iterable ใด ๆ มีสมาชิกอย่างน้อย 1 ตัวหรือไม่ ได้อย่างไร โดยไม่จำเป็นต้องพยายามไล่ดูสมาชิกของวัตถุ iterable เกิน 1 ตัว

เราสามารถเขียน รูปแบบนิพจน์ any( ⁠True for _ in iterable ⁠) เพื่อทดสอบว่าวัตถุ iterable ในนิพจน์ดังกล่าวมีสมาชิกอย่างน้อย 1 ตัวหรือไม่ได้ รูปแบบนิพจน์ดังกล่าวอาศัยความสามารถ short-circuiting ของฟังก์ชัน any เพื่อยุติการทำงานเมื่อทราบผลลัพธ์เป็นที่แน่นอนแล้ว ยกตัวอย่างเช่น

>>> odd_digits = (i for i in range(10) if i % 2 == 1)
>>> any(True for _ in odd_digits)
True
>>> even_digits = (i for i in range(10) if i % 2 == 0)
>>> any(True for _ in even_digits)
True
>>> empty_sequence = (i for i in range(10) if i > 10)
>>> any(True for _ in empty_sequence)
False
>>> sequence_of_false = (i for i in range(10) if not i)
>>> any(True for _ in sequence_of_false)
True

มีคำถาม 2 คำถามที่อาจเป็นที่สนใจ แต่จะขออนุญาตทิ้งไว้ให้เป็นคำถามชวนคิดด้วยตนเอง

ทำไมเราจึงไม่ใช้รูปแบบนิพจน์ any( ⁠iterable ⁠) ไปเลย แต่กลับใช้รูปแบบนิพจน์ any( ⁠True for _ in iterable ⁠) แทน?
เราจะออกแบบการทดลองอย่างไร จึงจะทำให้มั่นใจได้ว่ารูปแบบนิพจน์ any( ⁠True for _ in iterable ⁠) จะยุติการทำงานทันทีหลังจากที่วัตถุ iterable ปล่อยค่าสมาชิกตัวแรกออกมาแล้ว?

มาพิจารณาปัญหา ข. กันอีกครั้งหนึ่ง นั่นคือเราอยากทราบว่าข้อมูลตารางที่กำหนดให้มี local maximum อย่างน้อย 1 ช่องหรือไม่? แต่ในคราวนี้ เรามีเครื่องมือพร้อมที่จะแก้ปัญหานี้อย่างมีประสิทธิภาพมากขึ้นกัน

โค้ดที่จะใช้ตรวจสอบว่า “ผลลัพธ์ของ local_maxima ( ⁠Version 5 ⁠) จะเป็นวัตถุ iterator ที่ไม่ว่างเปล่าหรือไม่” อย่างมีประสิทธิภาพ สามารถเขียนได้ดังนี้

if any(True for _ in local_maxima(table)):  # from version 5
    print("The table contains at least 1 local maximum.")
else:
    print("The table has no local maxima.")

บทสรุปส่งท้าย

บทความนี้ได้นำเสนอวิธีการต่าง ๆ ที่จะช่วยทำให้การทำงานกับ for statement และวัตถุ iterable จนกลายออกมาเป็นโค้ดที่อ่านง่าย ดูสละสลวย และมีประสิทธิภาพที่ดี

เมื่อใดก็ตามที่เราจะต้องทำงานกับ for statement หรือวัตถุ iterable เราอาจจะถามคำถามต่อไปนี้กับตนเอง

มีโค้ดส่วนใดที่เกี่ยวข้องกับ iteration ที่สามารถแยกออกเป็นฟังก์ชันต่างหากที่รีเทิร์นค่าออกมาเป็นวัตถุ iterator หรือไม่?
มีโค้ดส่วนใดที่เดิมรีเทิร์นค่าเป็น list แล้วจะมีโอกาสให้โค้ดมีประสิทธิภาพดีขึ้นหากเราเปลี่ยนให้รีเทิร์นเป็นวัตถุ iterator หรือไม่?
มีฟังก์ชันใน Python standard library ใดที่จะช่วยให้การทำงานกับวัตถุ iterable ง่ายขึ้นหรือไม่? ( ⁠คำแนะนำ: สามารถค้นหาฟังก์ชันเหล่านี้เพิ่มเติมได้จากแพ็กเกจ itertools และ more-itertools เป็นต้น ⁠)

และขอปิดท้ายด้วยโค้ดเวอร์ชันสุดท้ายของฟังก์ชันนี่เราได้พยายามปรับปรุงให้ดียิ่งขึ้น

from typing import Iterator, Tuple
import numpy as np

def local_maxima(data: np.ndarray) -> Iterator[Tuple[int, int]]:
    """
    An iterator of table indices where each value is a local maximum
    in the given table data when comparing to adjacent table cells
    (either horizontally, vertically, or diagonally).
    """
    for pos in table_cells(data.shape):
        if all(data[pos] > data[nbr_pos]
               for nbr_pos in adjacent_cells(data.shape, pos)):
            yield pos

def table_cells(shape: Tuple[int, int]) -> Iterator[Tuple[int, int]]:
    """
    An iterator of indices for all cells in a table with a given shape
    which is a tuple of number of rows and columns, respectively.
    """
    n_rows, n_cols = shape
    return ((i, j) for i in range(n_rows) for j in range(n_cols))

def adjacent_cells(
        shape: Tuple[int, int],
        main_pos: Tuple[int, int],
) -> Iterator[Tuple[int, int]]:
    """
    An iterator of indices of all adjacent cells to the cell indicated by
    the provided main pos, subject to the given shape of the table.
    """
    n_rows, n_cols = shape
    i, j = main_pos

    for nbr_i in [i-1, i, i+1]:
        for nbr_j in [j-1, j, j+1]:
            if (nbr_i, nbr_j) == (i, j):
                continue
            if 0 <= nbr_i < n_rows and 0 <= nbr_j < n_cols:
                yield nbr_i, nbr_j

>>> import numpy as np
>>> grid = np.array([[-4,  5, -2,  0],
...                  [ 8,  6,  9,  1],
...                  [-2, -3,  5,  2],
...                  [-3,  5,  4,  6]])
>>> list(local_maxima(grid))
[(1, 0), (1, 2), (3, 3)]
>>> any(True for _ in local_maxima(grid))
True
>>> plains = np.array([[5, 4, 5],
...                    [4, 5, 4]])
>>> any(True for _ in local_maxima(plains))
False

ป.ล. บทความนี้อาจจะยาวกว่าปกติ ต้องขออภัยมา ณ ที่นี้ ขอขอบคุณเพื่อน ๆ ที่เสียสละเวลาอ่าน draft ของบทความและให้ feedback เพื่อใช้ปรับปรุงบทความนี้ให้ดียิ่งขึ้น และขอขอบคุณผู้อ่านทุกอ่านที่อ่านบทความจนจบครับ

ในบทความถัด ๆ ไปจะขอเว้นจากการเขียนเกี่ยวกับ Python Iterables สักพักหนี่ง แต่อาจจะพูดถึงหัวข้อที่ advance มากขึ้น หรืออาจจะพูดถึงเรื่องบันเทิงอื่น ๆ บ้างตามประสา โปรดติดตามชม

Prepare Your Iterables – Python Iterables Part 2 of N

ปัญหา ก. การคำนวณหา local maxima ในข้อมูลตารางสองมิติ

Version 1 – Naïve approach

Version 2 – Local 3 × 3 grid approach

Version 3 – Prepare your iterables

Version 4 Part A: Improving main function

Version 4 Part B. Using generator expression with table_cells

Version 4 Part C. Using generator function for adjacent_cells

Summary of Version 4

ปัญหา ข. ตารางสองมิติมี local maximum อย่างน้อยหนึ่งช่องหรือไม่

Version 5 – Changing return type of main function

ปัญหาย่อย: ตรวจสอบว่าวัตถุ iterable ไม่ว่างเปล่า

บทสรุปส่งท้าย

Version 4 Part B. Using generator expression with `table_cells`

Version 4 Part C. Using generator function for `adjacent_cells`